Current Issue : October-December Volume : 2023 Issue Number : 4 Articles : 5 Articles
Visual speech recognition (VSR) is a challenging task that aims to interpret speech based solely on lip movements. However, although remarkable results have recently been reached in the field, this task remains an open research problem due to different challenges, such as visual ambiguities, the intra-personal variability among speakers, and the complex modeling of silence. Nonetheless, these challenges can be alleviated when the task is approached from a speaker-dependent perspective. Our work focuses on the adaptation of end-to-end VSR systems to a specific speaker. Hence, we propose two different adaptation methods based on the conventional fine-tuning technique, the so-called Adapters. We conduct a comparative study in terms of performance while considering different deployment aspects such as training time and storage cost. Results on the Spanish LIP-RTVE database show that both methods are able to obtain recognition rates comparable to the state of the art, even when only a limited amount of training data is available. Although it incurs a deterioration in performance, the Adapters-based method presents a more scalable and efficient solution, significantly reducing the training time and storage cost by up to 80%....
In this paper, an interactive learning experience is proposed, aiming to involve museum visitors in a personalized experience of the transmittal of cultural knowledge in an active and creative way. The proposed system, called HapticSOUND, consists of three subsystems: (a) the Information, where visitors are informed about the traditional musical instruments; (b) the Entertainment, where visitors are entertained by playing serious games to virtually assemble traditional musical instruments by a set of 3D objects; and (c) the Interaction, where visitors interact with a digital musical instrument which is an exact 3D-printed replica of a traditional musical instrument, where cameras have been placed to capture user gestures and machine learning algorithms have been implemented for gesture recognition. The museum visitor can interact with the lifelike replica to tactilely and aurally explore the instrument’s abilities, producing sounds guided by the system and receiving real-time visual and audio feedback. Emphasis is given to the Interaction Subsystem, where a pilot study was conducted to evaluate the usability of the subsystem. Preliminary results were promising since the usability was satisfactory, indicating that it is an innovative approach that utilizes sensorimotor learning and machine learning techniques in the context of playing sounds based on real-time gesture and fingering recognition....
In this paper, we propose an imagined speech-based brain wave pattern recognition using deep learning. Multiple features were extracted concurrently from eight-channel electroencephalography (EEG) signals. To obtain classifiable EEG data with fewer sensors, we placed the EEG sensors on carefully selected spots on the scalp. To decrease the dimensions and complexity of the EEG dataset and to avoid overfitting during the deep learning algorithm, we utilized the wavelet scattering transformation. A low-cost 8-channel EEG headset was used with MATLAB 2023a to acquire the EEG data. The long-short term memory recurrent neural network (LSTM-RNN) was used to decode the identified EEG signals into four audio commands: up, down, left, and right. Wavelet scattering transformation was applied to extract the most stable features by passing the EEG dataset through a series of filtration processes. Filtration was implemented for each individual command in the EEG datasets. The proposed imagined speech-based brain wave pattern recognition approach achieved a 92.50% overall classification accuracy. This accuracy is promising for designing a trustworthy imagined speech-based brain–computer interface (BCI) future real-time systems. For better evaluation of the classification performance, other metrics were considered, and we obtained 92.74%, 92.50%, and 92.62% for precision, recall, and F1-score, respectively....
Interactive computer-based music systems form a rich area for the exploration of collaborative systems where sensors play an active role and are important to the design process. The Soundcool system is a collaborative and educational system for sound and music creation as well as multimedia scenographic projects, allowing students to produce and modify sounds and images with sensors, smartphones and tablets in real time. As a real-time collaborative performance system, each performance is a unique creation. In a comprehensive educational project, Soundcool is used to extend the sounds of traditional orchestral instruments and opera singers with electronics. A multidisciplinary international team participates, resulting in different performances of the collaborative multimedia opera The Mother of Fishes in countries such as Spain, Romania, Mexico and the USA....
Deceptive behaviour is a common phenomenon in human society. Research has shown that humans are not good at distinguishing deception, so studying automated deception detection techniques is a critical task. Most of the relevant technologies are susceptible to personal and environmental influences: EEG-based technologies need large and expensive equipment, facialbased technologies are sensitive with the camera’s perspective, and these reasons have somewhat limited the development of applications for deception detection technologies. In contrast, the equipment required for speech deception detection is cheap and easy to use, and the capture of speech is highly covert. Based on the application of signal decomposition algorithms in other fields such as EEG signals and speech emotion recognition, this paper proposed a signal decomposition and reconstruction method based on EMD to process the speech signal and a better deception detection performance was obtained by improving the speech quality. The comparison results with other decomposition algorithms showed that the EMD decomposition algorithm is the most suitable for our method. Across many different classification algorithms, accuracy improved by an average of 2.05% and the F1 score improved by an average of 1.7%. In addition, a new deception detector, called the TCN-LSTM network, was proposed in this paper. Experiments showed that this network organically combines the processing capability of TCN and LSTM for time series data; the recognition rate of deception detection was greatly improved, with the highest accuracy and F1 score reaching 86.2% and 86.0% under the EMD-based signal decomposition reconstruction method. Based on the research in this paper, the signal decomposition algorithms need to be further optimised for speech signals and more classification algorithms not used for this task should be tried....
Loading....